Jigsaw-ViT: Learning jigsaw puzzles in vision transformer

نویسندگان

چکیده

The success of Vision Transformer (ViT) in various computer vision tasks has promoted the ever-increasing prevalence this convolution-free network. fact that ViT works on image patches makes it potentially relevant to problem jigsaw puzzle solving, which is a classical self-supervised task aiming at reordering shuffled sequential back their original form. Solving been demonstrated be helpful for diverse using Convolutional Neural Networks (CNNs), such as feature representation learning, domain generalization and fine-grained classification. In paper, we explore solving auxiliary loss classification, named Jigsaw-ViT. We show two modifications can make Jigsaw-ViT superior standard ViT: discarding positional embeddings masking randomly. Yet simple, find proposed able improve both robustness over ViT, usually rather trade-off. Numerical experiments verify adding branch provides better large-scale classification ImageNet. Moreover, also improves against noisy labels Animal-10N, Food-101N, Clothing1M, well adversarial examples. Our implementation available https://yingyichen-cyy.github.io/Jigsaw-ViT.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

In this paper we study the problem of image representation learning without human annotation. Following the principles of selfsupervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across tasks...

متن کامل

Learning Image Representations by Completing Damaged Jigsaw Puzzles

In this paper, we explore methods of complicating selfsupervised tasks for representation learning. That is, we do severe damage to data and encourage a network to recover them. First, we complicate each of three powerful self-supervised task candidates: jigsaw puzzle, inpainting, and colorization. In addition, we introduce a novel complicated self-supervised task called “Completing damaged jig...

متن کامل

Shotgun Assembly of Random Jigsaw Puzzles

In a recent work, Mossel and Ross considered the shotgun assembly problem for a random jigsaw puzzle. Their model consists of a puzzle an n×n grid, where each vertex is viewed as a center of a piece. They assume that each of the four edges adjacent to a vertex, is assigned one of q colors (corresponding to ”jigs”, or cut shapes) uniformly at random. Mossel and Ross asked: how large should q = q...

متن کامل

Unique reconstruction threshold for random jigsaw puzzles

A random jigsaw puzzle is constructed by arranging n2 square pieces into an n× n grid and assigning to each edge of a piece one of q available colours uniformly at random, with the restriction that touching edges receive the same colour. We show that if q = o(n) then with high probability such a puzzle does not have a unique solution, while if q ≥ n1+ε for any constant ε > 0 then the solution i...

متن کامل

No Easy Puzzles: A Hardness Result for Jigsaw Puzzles

We show that solving jigsaw puzzles requires Θ(n ) edge matching comparisons, making them as hard as their trivial upper bound. This result generalises to puzzles of all shapes, and is applicable to both pictorial and apictorial puzzles.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition Letters

سال: 2023

ISSN: ['1872-7344', '0167-8655']

DOI: https://doi.org/10.1016/j.patrec.2022.12.023